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Abstract 

A theory of quantitative inference about the parameters of sampling 
distributions is constructed deductively by following very general rules, 
referred to as the Cox-P61ya-Jaynes Desiderata. The inferences are 
made in terms of probability distributions that are assigned to the 
parameters. The Desiderata, focusing primarily on consistency of the 
plausible reasoning, lead to unique assignments of these probabilities in 
the case of sampling distributions that are invariant under Lie groups. 
In the scalar cases, e.g. in the case of inferring a single location or scale 
parameter, the requirement for logical consistency is equivalent to the 
requirement for calibration: the consistent probability distributions 
are automatically also the ones with the exact calibration and vice 
versa. This equivalence speaks in favour of reconciliation between the 
Bayesian and the frequentist schools of reasoning. 

1 Introduction 

A theory of quantitative inference about the parameters of sampling distri- 
butions is formulated with special attention being paid to the consistency of 
the theory and to its ability to make verifiable predictions. In the present ar- 
ticle only basic concepts of the theory and their most important applications 
are presented while details can be found elsewhere [1]. 

Let p(xi\9I) be the probability for a random variate x to take the value 
x\ (to take a value in an interval (x\,x\ + dx) in the case of a continuous 
variate), given the family I of sampling distributions, and the value 6 of the 
parameter that specifies a unique distribution within the family (for exam- 
ple, a sampling distribution from the exponential family /, r _1 exp {— x/r}, 
is uniquely determined by the value of the parameter r). An inference about 
the parameter is made by specifying a real number, called (degree of) plau- 
sibility, (6\x\X2 ■ ■ ■ I), to represent our degree of belief in the value of the 
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(continuous) parameter to be within an interval (9,9 + d6). Every such 
plausibility is conditioned upon the information that consists of measured 
value(s) xi, X2, ■ ■ ■ of the sampling variate and of the specified family / of 
sampling distributions. 

We assume all considered plausibilities to be subjects to very general 
requirements, referred to as the Cox-P61ya-Jaynes (CP J) Desiderata [2,1], 
focusing primarily on consistency of the plausible reasoning. The require- 
ment of consistency can be regarded as the first of the requirements to be 
satisfied by every theoretical system, be it empirical or non-empirical. As 
for an empirical system, however, besides being consistent, it must also be 
falsifiable [3]. We therefore added a Desideratum to CP J Desiderata, requir- 
ing that the predictions of the theory must be verifiable so that, in principle, 
they may be refuted. 

It should be stressed that in this way the list of basic rules is completed. 
That is, the entire theory of inference about the parameters is built deduc- 
tively from the aforementioned Desiderata: in order not to jeopardize the 
consistency of the theory no additional ad hoc principles are invoked. 

2 Cox's and Bayes' Theorems 

Richard Cox showed [4] that a system for manipulating plausibilities is either 
isomorphic to the probability system or inconsistent (i.e. in contradiction 
with CP J Desiderata). Without any loss of generality we therefore once 
and for all choose probabilities p(9\xil) among all possible plausibility func- 
tions (9\x±I) to represent our degree of belief in particular values of inferred 
parameters. In this way the so-called inverse probabilities, p(9\x\I), and 
the so-called direct (or sampling) probabilities p(xi\9I), become subjects to 
identical rules. 

Transformations of probability distributions that are induced by variate 
transformations are also uniquely determined by the Desiderata. Let f(x\9I) 
be the probability density function (pdf) for a continuous random variate x 
so that its probability distribution is expressible as 



p(x\9I) = f(x\9I)dx . 



(1) 



Then, if the variate x is subject to a one-to-one transformation x 
g(x), the pdf for y reads: 



V = 



f(y\9I>) = f(x\9I) 



dy -i 



(2) 



dx 
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(by using the symbol I 1 instead of / on the left-hand side of © it is stressed 
that the above transformations may in general alter the form of the sampling 
distribution). Since the direct and the inverse probabilities are subjects to 
the same rules, the transformation of the pdf for the inferred parameter, 
f(6\xl), under a one-to-one transformation 9 — ► v = g(9) is analogous to 
the transformation of the sampling pdf: 

f(u\xl) = f(6\xl)\^\~ 1 . (3) 

Once the probabilities are chosen, the usual product and sum rules [2] 
become the fundamental equations for manipulating the probabilities, while 
many other equations follow from the repeated applications of the two. In 
this way, for example, Bayes' Theorem for updating the probabilities can be 
obtained: 

f(e\x,x2i) = fJ^iM^iIl (4) 
n 1 1 2 ] j f{e'\x 1 i)p{x 2 \e'x 1 i)de l ' 1 ' 

Here f(9\x\I) denotes the pdf for 9 based on x\ and / only (i.e. prior to 
taking datum x 2 into account), p(x2\9x\I) is the probability for x 2 (the so- 
called likelihood) given values 9 and x±, while the integral in the denominator 
on the right-hand side ensures appropriate normalization of the updated pdf 
f(9\xiX2l) for 9 (i.e. the pdf for 9 posterior to taking X2 into account). 

Bayes' Theorem (JIJ) allows only for updating pdf's f(9\xil) that were 
already assigned prior to their updating. Consequently, the existing appli- 
cations of our basic rules must be extended in order to allow for assign- 
ment of probability distributions to the parameters, with such assignments 
representing natural and indispensable starting points in every sequential 
updating of probability distributions. 

3 Consistency Theorem 

According to the CPJ Desiderata, the pdf for 9 should be invariant under 
reversing the order of taking into account two independent measurements of 
the sampling variate x. This is true if and only if the pdf that is assigned to 
9 on the basis of a single measurement of x, is directly proportional to the 
likelihood for that measurement, 

f(9\xl) - *WpO»W , 5) 
H&IXI) - J7r(9')p(x\9'l)d9' ' (5) 

where tt(9) is the consistency factor while the integral in the denominator 
on the right-hand side of © again ensures correct normalization of f(9\xl). 
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There is a remarkable similarity between the Bayes' Theorem (|4"|). ap- 
plicable for updating the probabilities, and the Consistency Theorem 
applicable for assigning the probability distributions to the values of the 
inferred parameters, but there is also a fundamental and very important 
difference between the two. While f(6\xil) in the former represents the pdf 
for 9 prior to taking datum X2 into account, tt(9) in the latter is (by con- 
struction of the Consistency Theorem [1]) just a proportionality coefficient 
between the pdf for 9 and the appropriate likelihood p(x\9I), so that no 
probabilistic inference is ever to be made on the consistency factor alone, 
nor can tt(9) be subject to the normalization requirement that is otherwise 
perfectly legitimate in the case of prior pdf's. 

The form of the consistency factor depends on the only relevant infor- 
mation that we posses before the first datum is collected, i.e. it depends 
on the specified sampling model. Consequently, when assigning probability 
distributions to the parameters of the sampling distributions from the same 
family I, this must be made according to the Consistency Theorem by us- 
ing the consistency factors of the forms that are identical up to (irrelevant) 
multiplication constants. 

4 Consistency Factor 

According to (j3J) and (J5J) combined, the consistency factors tt(9) for 9 and 
Tr(g(9)) for the transformed parameter g{9) are related as 

K(g(9)) = kir(9)\g'(9)\- 1 , (6) 

where k is an arbitrary constant (i.e. its value is independent of either x or 
9), while g'{9) denotes the derivative olg(9) with respect to 9. However, for 
the parameters of sampling distributions with the form I that is invariant 
under simultaneous transformations g a {x) and g a {9) of the sample and the 
parameter space, 

f(ga(x)\U0)n=f^m\9a(x)\-=f(g a (x)\g a (0)I) (7) 

(i.e. when I' = I), tt and ir must be identical functions up to a multiplication 
constant, so that © reads: 

7T(g a (9)) = k(a)n(9)\g' a (9)r 1 • (8) 

Index a in the above expressions indicates parameters of the transformations 
and k, in general, can be a function of a. In the case of multi-parametric 
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transformation groups the derivative g' a {9) is to be substituted by the ap- 
propriate Jacobian. 

The above functional equation has a unique solution for the transfor- 
mations g a (9) with the continuous range of admissible values a, i.e. if 
the set of admissible transformations g a (9) forms a Lie group. If a sam- 
pling distribution for x is invariant under a Lie group, then it is necessarily 
reducible (by separate one-to-one transformations of the sampling variate 
x — ► y and of the parameter 8 — ► fi) to a sampling distribution that can 
be expressed as a function of a single variable y — fj,, f(y\fj,I) = 4>{y — fi). 
Sampling distributions of the form a~ 1 if)(x/a) are examples of such dis- 
tributions: by substitutions y = \nx and /i = ln<r they transform into 
4>{y — fi) = exp {y — fi} ^(exp {y — fi}) (the scale parameters a are reduced 
to location parameters fi). 

It is therefore sufficient to determine the form of consistency factors for 
the location parameter [i since we can always make use of (JHJ) to transform 
7r(/i = g{9)) into the appropriate consistency factor tt(9) for the original 
parameter 9. Sampling distributions of the form <J)(x — jj) are invariant 
under simultaneous translations x — > x + a and + a; Va G (— oo, oo), 

and the functional equation @ in the case of the translation group reads 

7r(/i + a) = k(a) ir(fi) , (9) 

implying the consistency factor for the location parameters to be n(fi) oc 
exp {— q[i}, with q being an arbitrary constant. Accordingly, vr(cr) oc cr - ^ 1 ) 
is the appropriate form of the consistency factor for the scale parameters. 

The value of q is then uniquely determined by recognizing the fact that 
sampling distributions of the forms 4>(x — fi) and a~ 1 ip(x/a) are just special 
cases of two-parametric sampling distributions 

/(xlAw-I) = %(^) , (10) 

with a being fixed to unity and with \x being fixed to zero, respectively. The 
consistency factor vr(/x) therefore corresponds to assigning pdf's f(/j,\axl) 
while vr(cr) is to be used when assigning f(a\fj,xl). When neither a nor ji 
is fixed, however, the pdf (JlOj) is invariant under a two-parametric group of 
transformations, x — > ax + b, /j, — > a/j, + b and a — > aa; Va G (0, oo) and 
V6 € (—00,00), and the functional equation Q for the consistency factor 
7r(/x, a) for assigning f(fia\xl) reads 

7r(a/i + 6, aa) = 5 — Tr(fi,a) , (11) 
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so that ir(fj,,a) is to be proportional to a r , r being an arbitrary constant. 
According to the product rule, f(/ia\xl) can be factorized as 

f(jjur\xl) = f^WxI) f(a\xl) = f(a\fixl) f(v\xl) , (12) 

where f(a\xl) and f(/i\xl) are the marginal pdf's, e.g. 

f(a\xl) = J f^'a\xl)dfi' . (13) 

The equalities ()12|) are achieved if and only if q = and r = 1, i.e. if the 
three consistency factors, determined uniquely up to arbitrary multiplication 
constants, read: 

7r(//) = 1 and vr(cr) = 7r(/i, a) = a^ 1 . (14) 



5 Calibration 

In order to exceed the level of a mere speculation, the theory of probabilistic 
inference about the parameters must be able to make predictions that can 
be verified (or falsified) by experiments. Let therefore a random variate x be 
subject to a family of sampling distributions I and let several independent 
values Xi of the variate be recorded. The predictions of the theory are made 
in probabilities 

P(0 € (9i,i,6i,2)\xil) = [ *' 2 f(e'\ Xi I) d9' = 5 (15) 

■'01,1 

that given measured value Xi of the sampling variate, an interval (9n, 9%-}) 
contains the actual value of the parameter 6 of the sampling distribution. 
For the sake of simplicity, the intervals are chosen in such a way that the 
probabilities S are equal in each of the assignments. The predictions are 
then verifiable at long term relative frequencies: our probability judgments 
()15|) are said to be calibrated if the fraction of inferences with the specified 
intervals containing the actual value of the parameter, coincides with 5. 

An exact calibration of an inference about a parameter 6 is ensured if 
the assigned pdf f(6\xl) is related to the (cumulative) distribution function 
F(x,6) of the sampling variate as 

(16) 



f(9\xl) 



Te F(x,9) 
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and the consistency factors 7r(/x) and vr(cr) (|14jl do meet the above require- 
ment. Furthermore, if besides of being calibrated ()16|) . the pdf for 9 is to 
be assigned according to the Consistency Theorem the distribution of 
the sampling variate x is necessarily reducible to a distribution of the form 
4>{y — fi) [5]. But exactly the same necessary condition was obtained by re- 
quiring invariance of the sampling distribution under a Lie group, with such 
an invariance being indispensable for determination of consistency factors 
solely by imposing consistency to the assignment of pdf 's. Imposing logical 
consistency to the theory is thus equivalent to imposing calibration to its 
predictions: every probabilistic inference about a parameter of a sampling 
distribution that we are sure is consistent will thus at the same time also 
be calibrated and, vice versa, every calibrated inference, based on a poste- 
rior pdf that is factorized according to ©, will simultaneously be logically 
consistent, too. The equivalence of the two requirements speaks in favour of 
reconciliation between the (objective) Bayesian and the frequentist schools 
of reasoning, the former paying attention primarily to logical consistency 
and the latter stressing the importance of verifiable predictions. 



6 Consistency Lost and Regained 

Numerous examples can be found with the sampling distributions lacking 
invariance under Lie groups: there are sampling distributions for continu- 
ous random variates (e.g. the Weibull distribution) that are not invariant 
under continuous groups of transformations, the symmetry can be broken 
by imposing constraints to parameter spaces of otherwise invariant sam- 
pling distributions, or the sampling space may be discrete (e.g. in counting 
experiments), just to name three of the most common ones. No consis- 
tent qualitative parameter inference is possible in such cases, but under 
very general conditions the remedy is just to collect more data relevant to 
the estimated parameters. Then, according to the Central Limit Theorem, 
the discrete sampling distributions approach their dense (Gaussian) limits, 
the constraints of the parameter spaces become more and more irrelevant, 
and the sampling distributions of the maximum likelihood estimates of the 
inferred parameter 6 gain Gaussian shapes with 9 being the location param- 
eters of the latter, so the ability of making consistent inferences is regained. 
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7 Consistency Preserved 



Consistency factors are determined exclusively by utilizing the tools such as 
the product rule (|12j) and marginalization (|13j) . that are deducible directly 
from the basic Desiderata: in order to preserve consistency of inference it is 
crucial to refrain from using ad hoc shortcuts on the course of inference. For 
regardless how close to our intuitive reasoning these ad hoc procedures may 
be, how well they may have performed in some other previous inferences, 
and how respectable their names may sound (e.g. the principle of insufficient 
reason or its sophisticated version - the principle of maximum entropy, the 
principle of group invariance, the principle of maximum likelihood, and the 
principle of reduction), they are all found in general to lead to inferences 
that are neither consistent nor calibrated. 
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